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[57] ABSTRACT 

A method and system for speculatively sourcing cache 
mcmoiy data w itfiin a multiprocessor data-processing sys- 
tem is disclosed^In accordance with the method and system 
of the present invention, the data-processing system has 
multiple processing units, each of the processing units 
including at least one cache menoory. In response to a 
request for data by a first processing unit within the data- 
processing system, an intervention response is issued from 
a second processing unit within the data-processing system 
that contains the requested data. The requested data is then 
read from a cache memory within the second processing unit 
before a combined response from all the processing units 
returns to the second processing unit 

16 Claims, 3 Drawing Siieets 
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METHOD AND SYSTEM FOR 
SPECULATIVELY ACCESSING CACHE 

MEMORY DATA WITHIN A 
MULTIPROCESSOR DATA-PROCESSING 
SYSTEM USING A CACHE CONTROLLER 

BACKGROUND OF THE INVENTION 

1. Techaical Field 

The present ioventioo relates to a method and system for 
sharing data among cache memories in general and, in 
particular, to a method and system for sharing data among 
cache memories within a multiprocessor data-processing 
system. Still more particularly, the present invention relates 
to a method and system for speculatively sourcing data 
among cache memories within a multiprocessor data- 
processing systenL 

2. Description of the E^or Art 

In a symmetric multiprocessor (SMP) data-processing 
system, all of the processing units are generally identical; 
that is* they all utilize a common set or subset of instructions 
and protocols to operate and. generally, have the same 
architecture. Each jH-ocessing unit includes a processor core 
having multiple registers and execution units for carrying 
out program instructions. Each processing unit also may 
have one or more primary caches (i.e.. level one or LI 
caches), sudi as an instruction cache and/<ff a data cache, 
which arc iitq>lcmentcd utilizing high-speed memories. In 
addition, each processing unit also may include additional 
caches, typically referred to as a secondary cache (i.e., level 
two or L2 cache) for supporting the primary cadies such as 
those mentioned above. 

Under an SMP environment, the transfer of data from one 
processing unit to another processing unit on a system bus 
without going through a system memory is referred to as an 
intervention. An intervention protocol improves system per- 
formance by reducing the number of cases in which the 
system memory must be accessed in otdtr to satisfy a read 
or read-with-intent-to-modify (RWITM) request by any one 
of the processing units within Che system. 

Broadly speaking, when thae is an outstanding read/ 
RWITM request by a processing unit, any one of the other 
processing units, attached to the system bus. that possesses 
the requested data within its cacha[s) can source the data to 
the requesting processing unit. Under the traditional is 
intervention protocol, the processing unit having the data 
residing in its cache will wait for a ''combined** response 
from all processing units within the system before issuing a 
data bus request to source the data from its cache(s). 

At the same time. SMP buses also have a "retry" 
mechanism, and any read/RWTTM request that could t>e 
satisfied by an intervention could also be interrupted by a 
**retry** from any one of the processing units on the system 
bus. If one processing unit responds with an intervention 
while another processing unit responds with a **retryr under 
a well- established rule, the retry response automatically 
overrules the intervention response. As a result, if there is an 
outstanding retry request by any one of the processing units 
on the system bus. the processing unit that contains the data 
will not issue a data bus request. 

Consequently, it would be desirable to provide an 
improved sourcing scheme In which intervention data will 
be sourced in such a way that is less influenced by the 
^'retries'" from any of the processing units within the multi- 
processor data-processing system. 

SUMMARY OF THE INVENTION 
In view of the foregoing, it is therefore an object of the 
\ present invention to ^ovide an inp'oved method and sys- 
\ tcm for sharing data among cache memories. 
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It is anottier object of the present invention to provide an 
improved method and system for sharing data among cache 
memories within a multiprocessor data-processing system. 

It is yet another object of the present invention to provide 
^ an iit^aroved method and system for speculatively sourcing 
data among cache nseraories within a multiprocessor data- 
processing system. 

In accordance with the method and system of the present 
invention, a data-processing system has multiple processing 
units, each of the processing units including at least one 
cache memoiy. In response to a request for data by a first 
processing unit within the data-processing system, an inter- 
vention response is issued from a second processing unit 
within the data-processing system that contains the 
requested data. The requested data is then read from a cadie 
memory within the second processing unit before a com- 
bined response from all the processing units returns to the 
second processing unit 

All objects, features, and advantages of the present inven- 
tion will become apparent in the following detailed written 
description. 

BRIEF DESCRIPTION OF THE DRAWINGS 

25 The invention itself, as well as a preferred mode of use. 
further objects, and advantages thereof, will best be under- 
stood by reference to the following detailed description of an 
illustrative embodiment when read in conjunction with the 
accompanying drawings, wherein: 

30 FIG. 1 is a block diagram of a data-processing system in 
which the present invention may be applicable; 

FIG. 2 is a block diagram of a three-processor data- 
processing system for illustrating a sourcing scheme undo^ 
the prior art; 

FIG. 3 is a high-level logic flow diagram of a method for 
speculatively sourcing data among cache memories within a 
multiprocessor data-processing system, in accordance with a 
preferred embodiment of the i^cscnt invention. 

40 DETAILED DESCRIPnON OF A PREFERRED 
EMBODIMENT 

The present invention may be implemented in any mul- 
tiprocessor data-processing system, each processor having at 
least one cache memory. Also, it is understood that the 
features of the present invention may be applicable in 
various multiprocessor data-processing systems, each pro- 
cessor having a pimary cadic and a secondary cache. 
Referring now to the drawings and. in particular, to FIG. 

50 1. there is depicted a block diagram of a data-processing 
system 10 in which the present invention may be applicable. 
Data-processing system 10 includes multiple central proces- 
sor units (CPUs) llo-ll/i. and each of CPUs lla-ll/» 
contains a primary cache. As shown. CPU llo contains a 

55 primary cache Ha^ while CPU lln contains a primary cache 
I2n, Each of primary caches 12a-I2n may be a sectored 
cache. 

Each of CPUs Ilo-Un is coupled to each of secondary 
cadies I3a~ldn, respectively. Each of secondary caches 

(c 13a-13rt also may be a sectored cache. CPUs lla-ll/i. 
primary caches 12a~l2n, and secondary caches 13a-13n are 
connected to each other via an interconnect 15 to a system 
memory 14. Interconnect 15 can be either a bus or a switch. 
As a preferred embodiment of the present invention, a 

65 CPU. a primary cache, and a secondary cache, such as CPU 
11a. primary cache 12a. and secondary cache 13ici as 
depicted in FIG. 1. may be collectively known as a process- 
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ing unit Although a prefeired embodunent of a data- a "Modified** state, and the 12 cache within processing unit 
processing system is described in FIG. 1, it should be 22 is in an "Invalid** state. The subsequent sequence of 
understood that the present invention can be practiced within actions will be taken by a respective L2 cache controller of 
a variety of system configurations. For example, each of each processing unit for paforming the source intervention 
CPUs 11^3-11/1 may have more than two levels of cache s as dedicated by the prior art. 

niemoiy. After processing unit 20 makes its read/RWTTM request* 

With reference now to T^ie L there is illustrated a the read^WTTM request is "snooped" from system bus 23 
number of established coherency responses from a process- by processing unit 21 and processing unit 22. An L2 cache 
ing unit under the prior-art intervention protocol. After a directory lookup is performed in eadi of the processing units ^ 
{vocessing unit witidn the multq)rocess<H data-processing 21, 22 to~ determine whether or not the requested data is 
system makes a read or read- with-intcnt- to- modify resident in its L2 cache. Because processing unit 21 has the 
(RWTTM) request on a system bus. all other processing units requested data, an intervention response will be issued by 
within the system may issue one of the responses in accor- processing unit 21, and a finite state machine within pro- 
dance widi Table L after snooping. cessing unit 21 will be dispatched to control the following 

IS actions. If the data within the L2 cache of processing unit 21 
TABLE I is in a "Modifted** state, a modified intervention coherency 

re^xmse will be issued by processing unit 21. Otherwise, if 
the data within the L2 cache of processing unit 21 is in a 
"Shared** or "Q^ciusive** state, a shared intervention coher- 
^ ency response will be issued by processing unit 21. Because 
the L2 cache within processing unit 22 is in an "Invalid** 
state (or if it does not contain the requested data), a null 
coherency response will be issued by processing unit 22.^ 
After the issuance of the intervention response, process- 
ing unit 21 is pending for a combined response which 
basically includ^, in this example, the coherency response 
As depicted in Table L the coherency responses take die processing unit 21 If the returned 

form of a 3-bit snoop response signal, with the definition of combined response is a modified intervention coherency 
each coherency response set forth. These signals are response, processing unit 21 may start sourcing the 
encoded to indicate the sno<^ result after the address tenure. requested data from its 12 cadie. If processing unit 22 
Id addition, a priority value is associated with each response requests a retry for whatever reason, the sourcii^ must yield 
to aUow a system logic to determine which of the o^eiency the retry request (i.e., the sourcing sequence wilJ not 
responses should take fwiority when formulating a single proceed), under the established intervention protocol. Fbr 
snoop response signal to be returned to aU processing units 3^ example, processing unit 22 may be in a snoop queue busy 
on the system bus. For example, if a processing unit condition. 

responds with a shared intervention response (priority 3). If the data in the L2 cache of processing unit 21 has not 
and another {Hocessing unit responds with a retry re^nse been modified or is not resident in the LI cache (i.e.. not LI 
(priority 1). then the processing unit with the retry response induslve) since the snoop action has been initiated, process- 
will take priority such that the system logic will return a ^ ing unit 21 may begin to make a system bus request to the 
retry ct^erency response to the requesting processing unit as system bus arbiter (typically, the requested data imist be read 
well as to all other processing units that are attached on the into a buffer by the L2 cache controller before the system 
system bus. This system logic may reside in various com- bus request can begin). Otherwise, the LI cache cf process- 
ponents witfiin the system, such as a system control unit cr ing unit 21 will be flushed and invalidated (i.e.. forcing the 
a memory controller. 45 cache to "push" any modified data back to the L2 cache 

Several weU-known mechanisms may be employed to and invalidating the copy in the LI cache) before any system 
ascertain whidi cache (of a processing unit) is the "owner^ bus request can be made. If the LI cache of processing unit 
of the data that is being requested, and therefore entitied to 21 is in a **Shared** state, however, only an invalidation to the 
source die data. Under fee price-art MESI protocol, if a LI cache is required befwe making any data bus request 
cache holds the requested data in a "Modified*" or an 50 FVocessing unit 21 then waits for a system bus grant to 
'"Exclusive" state, that means this cache is the only one return. The aaual data-sourdng will t)egiii after fee dau bus 
within fee system which contains a valid copy of fee data grant is received. Once fee sourcing has completed, fee L2 
and is clearly fee owner. If. however, a cache holds fee cache of processing unit 20 will bt changed from an 
requested data in a ^^Shared*" state, feat means fee data must "Invalid** state to a "Shared*" state for a read request and to 
also be held in at least one ofeer cache within fee system. 33 a "Modified** state for an RWTTM request Contrarily. the 12 
Thus, potentially, eifeer one of fee two or more caches can cache of processing unit 21 will be changed from a "Modi- 
source fee data. In such a case, several alternatives are fied" state to a "Shared** state for a read request and to an 
available to determine which cache should perform sourc- ^^Invalid** statefor an RWTTM request. There is no change of 
ing. state in fee cache of processing unit 22. 

Referring now toFIG. 2, there is depicted a block diagram 60 Referring now to HG. 3. feere is depicted a high-level ^ 
of a three-processor data-processing system for illustrating a logic flow diagram of a mcfeod for s peculatively sourcinp j j 
sourcing scheme under fee prior art As shown, for example. data among cache menK)ries within a multiprocessor data- 0 
processing unit 20 desires to make a read or RWTTM request processing system, in accordance wife a preferred cmbodi- 
on a system bus 23, and L2 cadie of processing unit 21 ment of fee present invention. Starting at block 30, a 
contains the data being requested by processing unit 20. 6S read/RWTTM request is snooped from a system bus by all 
Furfeermore. fee L2 cache within processing unit 20 is in an processing units within fee system, as shown in block 31. An 
'Invalid** state, fee 12 cache within processing unit 21 is in L2 cache directory lookup is performed to determine by 
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each processing unit as to whettier or not the requested data 
is resident In Its L2 cache, as depicted in block 32. A null 
cohcreDcy response will be issued by all those processing 
units Oiat do not possess the requested data (sudi as pro- 
cessing unit 22 of FIG. 2). as illustrated in block 3d. and the 
process exits at block 99. On the other hand^ an intervention 
coherency response will be issued by a processing unit that 
possesses the requested data (such as processing unit 21 of 
FIG. 2), as shown in block 34. 

After the issuance of the intervention coherency response, 
the intervening processing unit must perform certain cache 
housekeeping tasks, as depicted in block 35. These tasks 
include flushing and invalidating the data copy in the LI 
cache of the intervening processing unit if the data copy in 
the LI cache has been moditicd. or simply invalidating die 
data copy in the LI cache of the intervening processing unit 
if the data copy in the LI cache has not been modified. 

Subsequently, the requested data is read from the L2 
cache of the intervening processing unit, preferably* to a 
buffer, and a request for the system data bus is made to a 
system bus arbiter, as illustrated in block 36. A deteimina- 
tton is made as to whether or not the system data bus has 
been granted, as shown in block 37. If the system data bus 
has not been granted, another determination is made as to 
whether or not a combined coherency response has returned 
yet, as d^icted in block 38. If the combined coherency 
response has not returned, the process returns to block 37. 

However, if the system bus has been granted, a sourcing 
of the requested data from the intervening processing may 
begin by driving the requested data to the system bus. as 
illustrated in block 39. Another determination is made as to 
whether or not a combined coherency response has returned 
at this point already, as shown in block 40. If the combined 
coherency response has not returned yet. the process will 
keep waiting for the combined coherency response to return 
while continuing the sourcing of the requested data to the 
system bus. 

After the combined coherency response has returned, a 
determination is made as to whether or not die combined 
coherency response is a 'Yetry.** as depicted in block 41. If 
the combined coherency response is a retry, then the system 
bus request (from block 36) will be cancelled if the system 
bus has not been granted yet or the sourcing of the requested 
data will be aborted immediately, as illustrated in block 42. 
Even if the sourdDg has already been coinpleted at this 
point, the results will be discarded due to the retry coherency 
response. Oth^^ise. if the combined coherency response is 
not a retry, the sourcing of the requested data will continue, 
if it has not been completed, until its completion. Finally, the 
status of the L2 cache in both the requesting processing unit 
and the intervening processing unit are updated accordingly, 
as shown In block 43, and the process exits at block 99. 

As has been described, the present invention provides a 
method of providing a speculative sourcing scheme for 
sharing data among cache memories within a multiprocessor 
data-processing system. Specifically the present disclosure 
describes a novel intervention implementation in which the 
requested data is read from the L2 cache of the intervening 
processing unit before the combined coherency response has 
returned. 

The present invention has obvious peifonnance advan- 
tages over die prior art because the delqy between a read/ 
RWTTM request on the system bus and the sampling of the 
combined response can be several system bus clock cycles. 
Hence, by allowing the requested data to be read from the L2 
cadie of the intervening processing unit before the com- 
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bined coherency response is received, the intervention 
latency is reduced tremendously and the overall SMP system 
performance is significandy improved. 

While the invention has been particularly shown and 
5 described widi reference to a preferred embodiment, it will 
be understood by those skilled in the art that various changes 
in form and detail may be made therein without departing 
from the spirit and scope of the invention. 

What is claimed is: 

1. A method for speculatively accessing data from a cache 
memory within a data-processing system having a plurality 
of processing units, eadi of said plurality of processing units 
induding at least one cache memory, said method compris- 
ing the steps of: 

request jig for data by a first processing unit within said 

data-processing system; and 
in response to an issuance of an intervention response 
from a second processing unit within said data- 
processing system indicating said second processing 
unit intends to source said requested data to a system 
bus. reading said requested data from said system bus 
prior to a combined response from all of said plurality 
of processing units returning to said second processing 
unit 

2. The method according to claim 1 . wherein said reading 
step further includes a step of reading said requested data by 
a cache controller. 

3. The method according to claim 1. wherein said reading 
step further includes a step of reading said requested data to 
a buffer 

4. The method according to claim 1. wherein said step of 
requesting for data includes a step of issuing a read request 
09- a read-with-intent-to-modify request. 

5. The method according to claim 1. wherein said inta- 
vention response from a second processing unit is a modified 
intervention response or a shared intervention response. 

6. The method according to claim 1. wherein said method 
funfaer includes a step of stopping said reading step if said 
returned combined response is a retry. 

7. The method according to claim 1, wherein said metiiod 
further includes a step of requesting a system data bus for 
sourcing of said requested data by said second processing 
unit before the return of said combined response. 

8. The method according to claim 7, wherein said method 
further includes a step of sourcing said requested data by 
said second processing unit before the return of said com- 
bined response. 

9. A processing unit having a cache memory enable of 
speculatively sourcing data within a multqirocessor data- 
processing system, said processing unit comprising: 

means fof requesting data by a first processing unit within 

said data-processing system; and 
responsive to issuance of an intervention response from a 
second processing unit within said data-processing 
system indicating said second processing unit intends 
to source said requested data to a system bus. means for 
reading said requested data from said system bus prior 
to a combined response from all said plurality of 
processing units returns to said second processing unit 

10. The processing unit according to claim 9. wherein said 
means for reading is a cache controller. 

11. The processing unit according to claim 9, wherein said 
means for reading further includes a means for reading said 
requested data into a buffer. 

12. The processing unit according to claim 9. wherein said 
request for data includes a read request or a read-with-intent- 
to-modify request. 
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13. The processing unit according Co claim 9, wherein said 
intervention response from a second processing unit is a 
modified intervention response or a shared intervention 
response. 

14. The processing unit according to claim 9. wherein said 
processing unit further Includes a means for stopping said 
reading by said reading means if said returned combined 
response is a retry. 

15. The processing unit according to claim 9. wherein said 
processing unit fur^er includes a means for requesting a 



8 



system data t>us for sourdng of said requested data by said 
second processing unit before the return of said combined 
response. 

16. The processing unit according to claim 15, wherein 
said processing unit further includes a means for sourcing 
said requested data by said second processing unit before the 
return of said combined response. 
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